German Indexing and Retrieval Test Data Base (GIRT) - Some Results of the Pre-test
نویسنده
چکیده
The Project German Indexing and Retrieval Test Data base (GIRT) provides a framework for the comparison of indexing and retrieval systems using German scientific information which is based on the social sciences. The aims and methods of the test environment are described. Some results of the pre-test, which was carried out with the retrieval systems Messenger and freeWAISsf, are shown and the consequences are discussed. The results are shown with respect to the precision and recall values, to the intersection of hits and the distribution of the hits on both systems, to the transformation of the queries into searches by the test persons. 1 General Aspects of GIRT The Project German Indexing and Retrieval Test Data base (GIRT) provides a framework for a comprehensive comparison of indexing and retrieval systems. Including intelligent indexing and software ergonomics the power of the systems can be judged in contrast to customary systems. Indexing and retrieval systems, which already exist or which are being developed, are to be tested with respect to their capacity and usability specially for the area of scientific information. On the part of retrieval the superiority of quantitative statistical systems over purely Boolean methods is accepted by common consent inside the information science community. This is approved by several tests, specially in the context of TREC (Text Retrieval Conference): Experiments have shown that vector space models and probabilistic models (i.e. such models, which possess a ranking functionality in the output) have better results than exact match models. 1 But the results of these systems still are insufficient because they are unsatisfactory with respect particularly to the qualitative results (mainly the low overlap of the results of different systems). Therefore the quantitative statistical systems still are critical discussed. The present discussion concerning the future perspective of intellectual indexing and referring to the power of automatic indexing and retrieval systems relies on the surveys and the experiments that were carried out in the English speaking world, on top within the above mentioned TREC initiative. But the results cannot be directly transformed to the situation of German language scientific information (and within that to the IZ): In the framework of TREC their are test data which mainly consist of English text and of newsletter and newswire text which make other demands on the search process than the reference retrieval in a scientific data base and in German texts. Comparable test results, which rely on research conducted with data bases out of the scientific information area and which reflect the specific problems of domain specific terminology at the same time, do not exist. Surveys on German materials have still not often been carried out, but it is necessary to test the existing TREC results affecting linguistic components of automatic systems with German language data. With the construction of the GIRT test data base and the offer of test facilities we want to remedy those two deficits and to give a solid basis for the comparison of automatic and intellectual indexing. The necessity of optimising the own resources and of orienting on the actual technical developments and the actual research makes it worth that information centres like the IZ gather information on an empirical basis and stimulate the development of such systems. It is necessary to recognise the advantages and the problems of the different systems during practical tests and then to derive the criteria for the selection and the combination of the different methods or modules. 1 Womser-Hacker 1996, p. 19 [translated from German] 2 Shaw/Burgin/Howell 1997 state: Retrieval performance in traditional and TREC test collections is generally mediocre; recall, precision, and effectiveness are rarely greater than 0.50.
منابع مشابه
University of Hagen at CLEF 2004: Indexing and Translating Concepts for the GIRT Task
This paper describes the work done at the University of Hagen for our participation at the German Indexing and Retrieval Test (GIRT) task of the CLEF 2004 evaluation campaign. We conducted both monolingual and bilingual information retrieval experiments. For monolingual experiments with the German document collection, the focus is on applying and comparing three indexing methods targeting full ...
متن کاملEvaluation of Cross-Language Information Retrieval Using the Domain-Specific GIRT Data as Parallel German-English Corpus
The development of the evaluation of domain-specific cross-language information retrieval (CLIR) is shown in the context of the Cross-Language Evaluation Forum (CLEF) campaigns from 2000 to 2003. The pre-conditions and the usable data and additionally available instruments are described. The main goals of this task of CLEF are to allow the evaluation of Cross-Language Information Retrieval (CLI...
متن کاملDie GIRT-Testdatenbank als Gegenstand informationswissenschaftlicher Evaluation
The motivations behind the creation of the GIRT test database are described and an overview of the structure of the different versions of GIRT and their use is given. The way in which GIRT has been employed in various information science contexts form 1997 to 2003 is then illustrated with a short description of methods and procedures used. The paper concludes with a summary of the trends in the...
متن کاملDomain-Specific Track CLEF 2005: Overview of Results and Approaches, Remarks on the Assessment Anaalysis
The domain-specific track aims at monoand cross-language information retrieval on structured scientific data. This track studies retrieval in a domain-specific context using two social science databases: The German Indexing and Retrieval Testdatabase (GIRT) (forth version GIRT-4: German/English pseudo-parallel corpus with identical documents) with 302,638 documents in total, and the Russian Soc...
متن کاملExploring the Potential of Semantic Relatedness in Information Retrieval
Employing lexical-semantic knowledge in information retrieval (IR) is recognised as a promising way to go beyond bag-of-words approaches to IR. However, it has not yet become a standard component of IR systems due to many difficulties which arise when knowledge-based methods are applied in IR. In this paper, we explore the use of semantic relatedness in IR computed on the basis of GermaNet, a G...
متن کامل